Distributed Value Function Approximation for Collaborative Multiagent Reinforcement Learning
نویسندگان
چکیده
In this article, we propose several novel distributed gradient-based temporal-difference algorithms for multiagent off-policy learning of linear approximation the value function in Markov decision processes with strict information structure constraints, limiting interagent communications to small neighborhoods. The are composed following: first, local parameter updates based on single-agent gradient algorithms, including eligibility traces state-dependent parameters and, second, stochastic time-varying consensus schemes, represented by directed graphs. proposed differ their form, definition traces, selection time scales, and way incorporating iterations. main contribution article is a convergence analysis general properties underlying Feller-Markov model. We prove under assumptions that estimates generated all weakly converge corresponding ordinary differential equations precisely defined invariant sets. It demonstrated how adopted methodology can be applied weaker constraints. variance reduction effect formulating analyzing an asymptotic equation. Specific guidelines communication network design provided. algorithms' superior illustrated characteristic simulation results.
منابع مشابه
Applying multiagent reinforcement learning to distributed function optimization problems
Consider a set of non-cooperative agents acting in an environment in which each agent attempts to maximize a private utility function. As each agent maximizes its private utility we desire a global ”world” utility function to in turn be maximized. The inverse problem induced from this situation is the following: How does each agent choose his move so that while he optimizes his private utility,...
متن کاملManifold Representations for Value-Function Approximation in Reinforcement Learning
Reinforcement learning (RL) has shown itself to be a successful paradigm for solving optimal control problems. However, that success has been mostly limited to problems with a finite set of states and actions. The problem of extending reinforcement learning techniques to the continuous state case has received quite a bit of attention in the last few years. One approach to solving reinforcement ...
متن کاملExplicit Manifold Representations for Value-Function Approximation in Reinforcement Learning
We are interested in using reinforcement learning for large, real-world control problems. In particular, we are interested in problems with continuous, multi-dimensional state spaces, in which traditional reinforcement learning approached perform poorly. Value-function approximation addresses some of the problems of traditional algorithms (for example, continuous state spaces), and has been sho...
متن کاملCBR for State Value Function Approximation in Reinforcement Learning
CBR is one of the techniques that can be applied to the task of approximating a function over high-dimensional, continuous spaces. In Reinforcement Learning systems a learning agent is faced with the problem of assessing the desirability of the state it finds itself in. If the state space is very large and/or continuous the availability of a suitable mechanism to approximate a value function – ...
متن کاملStatistical Linearization for Value Function Approximation in Reinforcement Learning
Reinforcement learning (RL) is a machine learning answer to the optimal control problem. It consists in learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the so-called value function. An important RL subtopic is to approximate this function when the system is too large for an exact representation. This paper ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Control of Network Systems
سال: 2021
ISSN: ['2325-5870', '2372-2533']
DOI: https://doi.org/10.1109/tcns.2021.3061909